An Embedding-Based Topic Model for Document Classification

نویسندگان

چکیده

Topic modeling is an unsupervised learning task that discovers the hidden topics in a collection of documents. In turn, discovered can be used for summarizing, organizing, and understanding documents collection. Most existing techniques topic are derivatives Latent Dirichlet Allocation which uses bag-of-word assumption However, bag-of-words models completely dismiss relationships between words. For this reason, article presents two-stage algorithm modelling leverages word embeddings co-occurrence. first stage, we determine topic-word distributions by soft-clustering random set embedded n -grams from second document-topic sampling each document distributions. This approach distributional properties instead using assumption. Experimental results on various data sets Australian compensation organization show remarkable comparative effectiveness proposed classification.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

A link-bridged topic model for cross-domain document classification

0306-4573/$ see front matter 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ipm.2013.05.002 ⇑ Corresponding author at: Department of Computer Science, South China University of Technology, Guangzhou, China. Tel.: +852 39438461; f 26035505. E-mail addresses: [email protected] (P. Yang), [email protected] (W. Gao), [email protected] (Q. Tan), [email protected] (K.-F. Wong)...

متن کامل

Research on Food Complains Document Classification Based-on Topic

In this paper, we design a classifier based-on topic for food complain documents, and take a series of measures to the implementation process. In order to accomplish feature reduction, the filter method named term filtering for independent topic features is proposed to compress each topic feature vector. We introduce the created food ontology as background knowledge and to expand the semantic o...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

An Automatic Approach for Document-level Topic Model Evaluation

Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topicand documentlevel model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2021

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3431728